How to Combine Fast Heuristic Markov Chain Monte Carlo with Slow Exact Sampling
نویسندگان
چکیده
Given a probability law π on a set S and a function g : S → R, suppose one wants to estimate the mean ḡ = ∫ g dπ. The Markov Chain Monte Carlo method consists of inventing and simulating a Markov chain with stationary distribution π. Typically one has no a priori bounds on the chain’s mixing time, so even if simulations suggest rapid mixing one cannot infer rigorous confidence intervals for ḡ. But suppose there is also a separate method which (slowly) gives samples exactly from π. Using n exact samples, one could immediately get a confidence interval of length O(n−1/2). But one can do better. Use each exact sample as the initial state of a Markov chain, and run each of these n chains for m steps. We show how to construct confidence intervals which are always valid, and which, if the (unknown) relaxation time of the chain is sufficiently small relative to m/n, have length O(n−1 log n) with high probability. 1 Background Let π be a given probability distribution on a set S. Given a function g : S → R, we want to estimate its mean ḡ := ∫ S g(s)π(ds). As we learn in elementary statistics, one can obtain an estimate for ḡ by taking samples from π and using the sample average g-value as an estimator. But algorithms which sample exactly from π may be prohibitively slow. This is the setting for the Markov chain Monte Carlo (MCMC) method, classical in statistical physics and over the last ten years studied extensively as statistical methodology [4, 7, 9, 12]. In MCMC one designs a Markov chain on state-space S to have stationary distribution π. Then the sample average g-value over a long run of the chain is a heuristic estimator of ḡ. Diagnostics for assessing 1This material is based upon work supported by the National Science Foundation under Grant No. 9970901
منابع مشابه
Generalizing Elliptical Slice Sampling for Parallel MCMC
Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...
متن کاملParallel MCMC with generalized elliptical slice sampling
Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...
متن کاملResampling Markov Chain Monte Carlo Algorithms: Basic Analysis and Empirical Comparisons
Sampling from complex distributions is an important but challenging topic in scientific and statistical computation. We synthesize three ideas, tempering, resampling, and Markov moving, and propose a general framework of resampling Markov chain Monte Carlo (MCMC). This framework not only accommodates various existing algorithms, including resample-move, importance resampling MCMC, and equi-ener...
متن کاملMarkov chain Monte Carlo methods for Dirichlet process hierarchical model
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional metho...
متن کاملEfficient Monte Carlo Methods for Conditional Logistic Regression
Exact inference for the logistic regression model is based on generating the permutation distribution of the sufficient statistics for the regression parameters of interest conditional on the sufficient statistics for the remaining (nuisance) parameters. Despite the availability of fast numerical algorithms for the exact computations, there are numerous instances where a data set is too large t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001